Dump activation shardings by charlesli640 · Pull Request #3080 · AI-Hypercomputer/maxtext

charlesli640 · 2026-02-04T18:54:25Z

Description

To dump activation shardings to golden file for further comparison. It can include in unit test in case further code change touches activation shardings. This PR is the initial submission for sharding dump json files.

Output

The output format is readable and comparable by both human and machine. For exampletests deepseek2-16b/v5p-16/slice_1 activation dump as below

{
  "Activation Sharding Dump": [
    {
      "deepseek/inputs: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "deepseek/pre_attention_norm: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention_mla/inputs_q: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention_mla/inputs_kv: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "attention_mla/q_nope: bfloat16[96,2048,16,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/q_pe: bfloat16[96,2048,16,64]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/query: bfloat16[96,2048,16,192]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/key_nope: bfloat16[96,2048,16,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/key_rope: bfloat16[96,2048,16,64]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/key: bfloat16[96,2048,16,192]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/value: bfloat16[96,2048,16,128]": {
        "logic_axes": "('activation_kv_batch', 'activation_length_no_exp', 'activation_kv_heads', 'activation_kv_head_dim')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_op/query: bfloat16[96,16,2048,192]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_op/key: bfloat16[96,16,2048,192]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_op/value: bfloat16[96,16,2048,128]": {
        "logic_axes": "Unknown",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "attention_mla/out: bfloat16[96,2048,16,128]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_heads', 'activation_kv')",
        "PartitionSpec": "P('fsdp', None, None, None)"
      }
    },
    {
      "deepseek/attention_result: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "deepseek/post_attention_norm: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "linears/x: bfloat16[96,2048,10944]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_mlp')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "deepseek/mlp: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "deepseek/x: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "moe/inputs: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', None)",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "moe/gate_logits: bfloat16[96,2048,64]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', None)",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "linears/x: bfloat16[96,2048,2816]": {
        "logic_axes": "('activation_batch', 'activation_length_no_exp', 'activation_mlp')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    },
    {
      "deepseek/mlp_lnx: bfloat16[96,2048,2048]": {
        "logic_axes": "('activation_batch', 'activation_norm_length', 'activation_embed')",
        "PartitionSpec": "P('fsdp', None, None)"
      }
    }
  ]
}

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-04T19:34:21Z

Codecov Report

❌ Patch coverage is 84.21053% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/utils/sharding.py	80.00%	3 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

gobbleturk · 2026-02-04T19:49:09Z

I think this LGTM although there are a lot of names to review! How did you generate these names?

charlesli640 · 2026-02-06T20:17:43Z

I think this LGTM although there are a lot of names to review! How did you generate these names?

These names are generated from local <file_name>/<variable_name>. Sometimes it may not correctly reflect the actual model/layer, but it is basically serving as an identifier/key for logging/dumping/comparing purpose.

src/MaxText/sharding.py

src/maxtext/utils/sharding.py

src/MaxText/sharding.py

NuojCheng

Thanks Charles! Just some minor comments

richjames0

lgtm

Using inspect to get call stacktrace Cmd to generate input_shardings.json files: python -m tests.utils.run_sharding_dump

charlesli640 requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners February 4, 2026 18:54

charlesli640 added the draft Draft PR label Feb 4, 2026

charlesli640 marked this pull request as draft February 5, 2026 01:01

charlesli640 force-pushed the charlesli/input_sharding branch from 4b17fdb to 511be4b Compare February 5, 2026 17:59

charlesli640 force-pushed the charlesli/input_sharding branch 5 times, most recently from 8071699 to 486ebfb Compare February 10, 2026 18:49

charlesli640 marked this pull request as draft February 12, 2026 23:10

charlesli640 force-pushed the charlesli/input_sharding branch 6 times, most recently from 660b637 to 504c66a Compare February 19, 2026 18:25

charlesli640 marked this pull request as ready for review February 19, 2026 18:25

charlesli640 force-pushed the charlesli/input_sharding branch from 504c66a to afc474b Compare February 19, 2026 18:29

NuojCheng reviewed Feb 19, 2026

View reviewed changes

src/MaxText/sharding.py Outdated Show resolved Hide resolved

NuojCheng reviewed Feb 19, 2026

View reviewed changes

src/maxtext/utils/sharding.py Show resolved Hide resolved

NuojCheng reviewed Feb 19, 2026

View reviewed changes

src/MaxText/sharding.py Outdated Show resolved Hide resolved

NuojCheng approved these changes Feb 19, 2026

View reviewed changes

charlesli640 force-pushed the charlesli/input_sharding branch 12 times, most recently from 31860b0 to 5a1e8c9 Compare February 25, 2026 17:41

richjames0 approved these changes Feb 26, 2026

View reviewed changes

NuojCheng added the pull ready label Feb 26, 2026

Dump input/activation sharding info to json files

d4bc454

Using inspect to get call stacktrace Cmd to generate input_shardings.json files: python -m tests.utils.run_sharding_dump

charlesli640 force-pushed the charlesli/input_sharding branch from d7a2f5f to d4bc454 Compare February 26, 2026 19:06

copybara-service bot merged commit 0b6a8d3 into AI-Hypercomputer:main Feb 26, 2026
44 of 49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump activation shardings#3080

Dump activation shardings#3080
copybara-service[bot] merged 1 commit intoAI-Hypercomputer:mainfrom
CIeNET-International:charlesli/input_sharding

charlesli640 commented Feb 4, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

gobbleturk commented Feb 4, 2026

Uh oh!

charlesli640 commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NuojCheng left a comment

Uh oh!

richjames0 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

charlesli640 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Output

Checklist

Uh oh!

codecov bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gobbleturk commented Feb 4, 2026

Uh oh!

charlesli640 commented Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NuojCheng left a comment

Choose a reason for hiding this comment

Uh oh!

richjames0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

charlesli640 commented Feb 4, 2026 •

edited

Loading

codecov bot commented Feb 4, 2026 •

edited

Loading